Exploration of Contextual Constraints for Character Pre-Classification

نویسندگان

  • Tin Kam Ho
  • George Nagy
چکیده

We present strategies and results for identifying the symbol type (lower-case, upper-case, digit, and punctuation or special symbols) of every character in a text document by using various kinds of information from neighboring characters. In the expectation of reasonable word and character segmentation for shape clustering, we designed several type recognition methods that depend on cluster n-grams, shape codes, and withinword context. On an ASCII test corpus of 925 articles that simulates perfect image-level processing, these methods achieve a substantial improvement over default assignment of all characters to lower case.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Named Entity Recognition Using a Character-based Probabilistic Approach

We present a named entity recognition and classification system that uses only probabilistic character-level features. Classifications by multiple orthographic tries are combined in a hidden Markov model framework to incorporate both internal and contextual evidence. As part of the system, we perform a preprocessing stage in which capitalisation is restored to sentence-initial and all-caps word...

متن کامل

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...

متن کامل

Combining Lightly-Supervised Text Classification Models for Accurate Contextual Advertising

In this paper we propose a lightlysupervised framework to rapidly build text classifiers for contextual advertising. In contextual advertising, advertisers often want to target to a specific class of webpages most relevant to their product, which may not be covered by a pre-trained classifier. Moreover, the advertisers are only interested in the target class. Therefore, it is more suitable to m...

متن کامل

Deformed Systems for Contextual Text Recognition

A fuzzy method for incorporating the contextual constraints into a text recognition system is presented here. The method takes as input all the internal result that an Isolated Character Classifier (ICC) computes for an input letter, instead of an unique output character. The internal result is handled here as a fuzzy set which is then processed by a Deformed System. Such a Deformed System repr...

متن کامل

An Improved Pre-classification Method for Off- line Handwritten Chinese Character Using Four Corner Feature

Pre-classification can effectively improve the performance of handwritten Chinese character recognition. This paper presents a method that uses four corner feature for pre-classification of handwritten Chinese characters. Considering writing variations, we define a set of basic stroke structures and match them with the structures in four corner regions of character image. The matching result wi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001